Network Traffic Analysis: Hadoop Pig Vs Typical Mapreduce
نویسنده
چکیده
Big data analysis has become much popular in the present day scenario and the manipulation of big data has gained the keen attention of researchers in the field of data analytics. Analysis of big data is currently considered as an integral part of many computational and statistical departments. As a result, novel approaches in data analysis are evolving on a daily basis. Thousands of transaction requests are handled and processed everyday by different websites associated with e-commerce, e-banking, e-shopping carts etc. The network traffic and weblog analysis comes to play a crucial role in such situations where Hadoop can be suggested as an efficient solution for processing the Netflow data collected from switches as well as website access-logs during fixed intervals.
منابع مشابه
Network Traffic Analysis: Hadoop Pig vs Typical MapReduce
Big data analysis has become much popular in the present day scenario and the manipulation of big data has gained the keen attention of researchers in the field of data analytics. Analysis of big data is currently considered as an integral part of many computational and statistical departments. As a result, novel approaches in data analysis are evolving on a daily basis. Thousands of transactio...
متن کاملA Comparative Survey Based on Processing Network Traffic Data Using Hadoop Pig and Typical Mapreduce
Big data analysis has now become an integral part of many computational and statistical departments. Analysis of peta-byte scale of data is having an enhanced importance in the present day scenario. Big data manipulation is now considered as a key area of research in the field of data analytics and novel techniques are being evolved day by day. Thousands of transaction requests are being proces...
متن کاملPhurti: Application and Network-aware Flow Scheduling for Mapreduce
Traffic for a typical MapReduce job in a datacenter consists of multiple network flows. Traditionally, network resources have been allocated to optimize network-level metrics such as flow completion time or throughput. Some recent schemes propose using application-aware scheduling which can reduce the average job completion time. However, most of them treat the core network as a black box with ...
متن کاملMaximizing Data Locality in Hadoop Clusters via Controlled Reduce Task Scheduling
The overall goal of this project is to gain a hands-on experience with working on a large open-ended research-oriented project using the Hadoop framework. Hadoop is an open source implementation of MapReduce and Google File System, and is currently enjoying wide popularity. Students will modify the task scheduler of Hadoop, conduct several experimental studies, and analyze performance and netwo...
متن کاملFrom SPARQL to MapReduce: The Journey Using a Nested TripleGroup Algebra
MapReduce-based data processing platforms offer a promising approach for cost-effective and Web-scale processing of Semantic Web data. However, one major challenge is that this computational paradigm leads to high I/O and communication costs when processing tasks with several join operations typical in SPARQL queries. The goal of this demonstration is to show how a system RAPID+, an extension o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013